Support atomic CTAS and RTAS with SparkSessionCatalog#1183
Merged
rdblue merged 3 commits intoapache:masterfrom Jul 9, 2020
Merged
Support atomic CTAS and RTAS with SparkSessionCatalog#1183rdblue merged 3 commits intoapache:masterfrom
rdblue merged 3 commits intoapache:masterfrom
Conversation
Merged
Contributor
|
+1 LGTM |
Contributor
Author
|
I ran tests locally because CI is running behind. Everything looks good, I'll merge this. Thanks for reviewing, @danielcweeks! |
cmathiesen
pushed a commit
to ExpediaGroup/iceberg
that referenced
this pull request
Aug 19, 2020
|
Hello @rdblue ! According to this PR, SparkSessionCatalog is also atomic on CTAS, RTAS. However, on documentation it says "CTAS is supported, but is not atomic when using SparkSessionCatalog.". https://iceberg.apache.org/docs/latest/spark-ddl/#create-table-as-select Which one is correct? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

This adds support for atomic CTAS and RTAS commands when using SparkSessionCatalog in Spark 3.
If a TableCatalog in Spark 3 implements StagingTableCatalog, then all CTAS/RTAS operations will use the staging table methods, assuming that all tables in the catalog support the same capabilities. Iceberg tables support atomic operations, but tables loaded by the wrapped session catalog do not. The work-around is to mimic Spark's non-atomic behavior by creating a table immediately, using it for the write, and rolling back by dropping the table.
This PR doesn't contain new tests because the session catalog in Spark 3 does not work with v2 tables. It will always return a
V1Table. Because a v1 table is always returned, there are no code paths that will load non-Iceberg tables using the session catalog. When the provider for a table is not a v2 provider, Spark will bypass the v2 plugin. A plugin can define and load v2 tables, but v2 will never be used for tables loaded by the wrapped session catalog.